Skip to content

Parquet: Cache adjacent identical metadata in variant reader#16852

Open
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:variant-metadata-caching
Open

Parquet: Cache adjacent identical metadata in variant reader#16852
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:variant-metadata-caching

Conversation

@nssalian

@nssalian nssalian commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

VariantMetadataReader now caches the last parsed VariantMetadata so adjacent rows with identical metadata bytes skip the re-parse.

Test plan

  • testMetadataCacheHits: 50 rows of identical metadata; reference identity proves the cached instance is reused.
  • testMetadataCacheInvalidatesByLength: alternating metadata of different lengths; the length pre-check forces invalidation.
  • testMetadataCacheInvalidatesByBytes: pairs of identical-length-different-bytes metadata; bufferEquals must discriminate by content.

@nssalian nssalian marked this pull request as ready for review June 17, 2026 18:56
@nssalian nssalian requested review from Fokko and huaxingao June 17, 2026 18:56
Comment thread parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants